Controlling Memory Access Concurrency in Efficient Fault-Tolerant Parallel Algorithms
نویسندگان
چکیده
The CRCW PRAM under dynamic fail-stop (no restart) processor behavior is a fault-prone multiprocessor model for which it is possible to both guarantee reliability and preserve eeciency. To handle dynamic faults some redundancy is necessary in the form of many processors concurrently performing a common read or write task. In this paper we show how to signiicantly decrease this concurrency by bounding it in terms of the number of actual processor faults. We describe a low concurrency, eecient and fault-tolerant algorithm for the Write-All primitive: \using N processors, write 1's into N locations". This primitive can serve as the basis for eecient fault-tolerant simulations of algorithms written for fault-free PRAMs on fault-prone PRAMs. For any dynamic failure pattern F , our algorithm has total write concurrency jFj and total read concurrency 7 jFj logN, where jFj is the number of processor faults (for example, there is no concurrency in a run without failures); note that, previous algorithms used (N logN) concurrency even in the absence of faults. We also describe a technique for limiting the per step concurrency and present an optimal fault-tolerant EREW PRAM algorithm for Write-All, when all processor faults are initial.
منابع مشابه
Analysis of Instant and Total Memory Access Concurrency in Robust Parallel Algorithms
Algorithms in synchronous parallel models of computation with processor crashes can be made both efficient and fault-tolerant. The basis for fault-tolerance in such settings is the ability of multiple processors to concurrently read from and write to shared memory. Concurrent memory access provides redundancy that is necessary for combining fault-tolerance and efficiency. The model considered h...
متن کاملEfficient Wait-Free Implementation of a Concurrent Priority Queue
Binary snapshots p. 18 Linear-time snapshot protocols for unbalanced systems p. 26 Towards a necessary and sufficient condition for wait-free synchronization p. 39 Efficient algorithms for checking the atomicity of a run of read and write operations p. 54 Benign failure models for shared memory p. 69 Generalized agreement between concurrent fail-stop processes p. 84 Controlling memory access co...
متن کاملFailure-sensitive Analysis of Parallel Algorithms with Controlled Memory Access Concurrency
The abstract problem of using P failure-prone processors to cooperatively update all locations of an N-element shared array is called Write-All. Solutions to Write-All can be used iteratively to construct efficient simulations of pram algorithms on failureprone prams. Such use of Write-All in simulations is abstracted in terms of the iterative Write-All problem. The efficiency of the algorithmi...
متن کاملEfficient, scalable consistency for highly fault-tolerant storage
Fault-tolerant storage systems spread data redundantly across a set of storage-nodes in an effort to preserve and provide access to data despite failures. One difficulty created by this architecture is the need for a consistent view, across storage-nodes, of the most recent update. Such consistency is made difficult by concurrent updates, partial updates made by clients that fail, and failures ...
متن کاملSupporting Fault-Tolerant Parallel Programming in Linda
Linda is a language for programming parallel applications whose most notable feature is a distributed shared memory called tuple space. While suitable for a wide variety of programs, one shortcoming of the language as commonly defined and implemented is a lack of support for writing programs that can tolerate failures in the underlying computing platform. This paper describes FT-Linda, a versio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Nord. J. Comput.
دوره 2 شماره
صفحات -
تاریخ انتشار 1995